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ABSTRACT 
a 


Poor motivation is commonly cited as a concern when interpreting results from low-stakes 
standardized tests administered to postsecondary students. This study investigates the 
associations between test administration procedures and students’ self-reported effort and 
performance on the Collegiate Learning Assessment (CLA), an open-ended test of college 
students’ critical-thinking and writing skills. Coefficient estimates from a series of hierarchical 
linear models revealed that paying students to take tests and offering performance-based 
incentives were positively associated with effort and performance. Mandatory testing, however, 
was negatively associated with effort and performance. Faculty involvement in recruiting, 
giving extra course credit, and offering prize raffle entries were not associated with effort or 
performance. Effort appeared to mediate the relationship between some test administration 
variables (e.g., payment and mandatory testing) and performance. 
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TEST ADMINISTRATION PROCEDURES AND THEIR 
RELATIONSHIPS WITH EFFORT AND PERFORMANCE 
ON A COLLEGE OUTCOMES TEST 


Motivation to perform well on a test is a potentially problematic source of construct-irrelevant 
variance (Messick, 1995), especially when there are no stakes attached to performance. That 
is, interpretations of test scores as indicators of examinees’ knowledge and skills may be 
compromised if those examinees do not put forth the effort necessary to demonstrate the full 
extent of their abilities. Low-stakes testing is commonplace in K-12 education in the United 
States, and concerns over suspect motivation are exacerbated when aggregate test results 
are used to measure teacher or school effectiveness (e.g., Guerriero, 2013). In postsecondary 
education, tests such as CAE’s Collegiate Learning Assessment (CLA), ACT’s Collegiate 
Assessment of Academic Proficiency, and the ETS Proficiency Profile are typically administered 
under low-stakes conditions to provide evidence of student learning for stakeholders, 
prospective students, and accreditors. 


Poor motivation is commonly raised in critiques of postsecondary institutional assessment 
programs (Banta, 2008), and some colleges cite this concern as a reason for not using 
assessments such as the CLA. Statistical approaches to adjusting results for low motivation 
have been proposed (e.g., Sundre & Wise, 2003), but it would be better if no such adjustments 
were necessary. In addition to psychological variables, such as achievement goals and 
personality (Barry, Horst, Finney, Brown, & Kopp, 2010) and the format of the test (Sundre, 
1999; Wise, 2006), test administration procedures potentially impact student motivation on 
tests. Indeed, inconsistencies in methods of recruiting and incentivizing examinees have been 
suggested as causes of variation in results across administrations (Hosch, 2010), and rigorous 
proctor training has been shown to increase self-reported effort (Lau, Swerdzewski, & Jones, 
2009). 


This study investigates test administration procedures and their possible associations with 
effort and performance on the CLA, a test of college students’ critical-thinking and writing 
skills. In 2008, a post-administration survey was delivered to administrators at all participating 
institutions. This survey asked whether CLA testing was mandatory, whether faculty were 
involved in recruiting students, and how students were incentivized. In this study, a series of 
hierarchical linear models were employed to determine whether test administration procedures 
were significantly associated with test performance and self-reported effort after controlling 
for prior ability. Results could inform best practices for recruiting and incentivizing students 

to maximize motivation, thereby improving the validity of inferences from results of low-stakes 
college outcomes tests. 


LITERATURE REVIEW 


Prior research has shown that attaching consequences to test performance can have 

large effects on motivation and test performance (Liu, Bridgeman, & Adler, 2012; Napoli & 
Raymond, 2004; Wise & DeMars, 2005; Wolf & Smith, 1995). Tests like the CLA, however, are 
almost exclusively administered in low-stakes contexts, so test administrators must rely on 
alternate means of fostering motivation. For example, it has been recommended to stress the 
importance and usefulness of a test because those feelings are correlated with effort, which 
partly mediates test performance (Cole, Bergin, & Whittaker, 2008). Alternatively, instilling a 
sense of competition with rival schools may have some effect on test performance (Bracey, 
1996). 


Uniformly high motivation has been reported in some international testing contexts (Baumert & 
Demmrich, 2001; Ekl6f, 2007), with examinees reporting social responsibility, competitive spirit, 
interest, and personal or intrinsic motivators as the reasons for their motivation (Ekl6f, 2008). 
Such findings sharply contrast with U.S. students taking low-stakes tests like the National 
Assessment of Educational Progress, where a large percentage of students report a low sense 
of importance and low effort (ETS, 1993). 


Although monetary incentives have been effective for improving test scores in some content 


areas (Bettinger, 2010), performance-contingent financial rewards appear to be more effective 
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than a reward for simple test completion (Braun, Kirsch, & Yamamoto, 2011). In fact, bigger 
financial incentives appear to be more effective than smaller financial incentives on math 

and reading tests, but only if given at the time of testing (Levitt, List, Neckerman, & Sadoff, 
2011). Even offering additional entries in a prize lottery for high math test scores has been 
effective in increasing self-reported effort and performance (Cole, 2007). The effectiveness of 
performance-contingent rewards, however, is not universal. In a series of studies, neither small 
nor large incentives improved 12th graders’ math test performance (O’Neil, Abedi, Miyoshi, & 
Mastergeorge, 2005; O’Neil, Sugrue, & Baker, 1995/1996). 


This study supplements prior research by examining the associations between monetary 
incentives (payment, performance-based incentives, and raffle entries), self-reported effort, 
and performance on a low-stakes college outcomes test. Additionally, this study included 
several variables not previously studied: the involvement of faculty in student recruitment, 
mandatory (versus voluntary) testing, and offering course extra credit. Separate analyses were 
conducted for freshmen and seniors due to different administration procedures for those two 
groups. Moreover, self-reported effort among older students may be different from younger 
students (Kiplinger & Linn, 1995/1996; Liu et al., 2012). 


METHOD 


Sample 

The sample included 5,428 entering freshmen and 4,611 graduating seniors at 102 four-year 
colleges and universities (24 research universities, 47 master’s colleges and universities, 30 
baccalaureate colleges, and one seminary). Of those schools, 41 had a total enrollment greater 
than 10,000 students, 58 were public institutions, and four were classified as Historically Black 
Colleges and Universities. In terms of geographic distribution, 15 were located in the mid- 
Atlantic or New England regions, 15 in the Great Lakes or plains, 38 in the southeast, and 34 in 
the west or southwest. They had admissions rates ranging from 18% to 100% (median 66%), 
six-year graduation rates ranging from 19% to 92% (median 55%), and a percentage of White 
students ranging from 5% to 95% (median 68%). Because the statistical analyses controlled for 
students’ prior academic abilities, only students with SAT or ACT scores on record were included 
in the data set. 


Measures 

CLA Performance Task (PT). Students had 90 minutes to analyze a set of documents 
representing a real-world problem and answer a series of essay questions asking them to 
analyze the provided information and then propose a solution. Students were randomly 
assigned one of many possible PTs. A group of trained scorers evaluated the responses using 
scales that described the quality of analysis, problem solving, and writing effectiveness and 
mechanics. Inter-rater correlations on PT total scores were typically around .85. 


Self-reported effort. Following the PT, students completed a brief questionnaire, including an 
item that asked students how much effort they put into the test. The response options included 
“Made little effort,” “Made some effort,” “Mainly did my best,” and “Tried my best.” 


Prior ability. As part of the regular CLA administration, SAT and ACT scores were collected 
from college registrars’ offices. ACT total scores were converted to the SAT score scale using a 
concordance table (ACT, 2008). 


Post-administration survey. Completed surveys were received from 102 (52%) of the 195 
institutions participating in the 2007-2008 CLA administration. Relevant to this analysis, the 
survey asked whether students were required to take the CLA, whether faculty participated 
in CLA outreach to students, and how participants were incentivized. The incentives included 
in this analysis were payment (money, gift certificate, or university cash), performance-based 
incentives (e.g., the top five scorers receive $100), raffle entry, and course extra credit. Other 
incentives were excluded from analyses because they were employed by fewer than 10% of 
institutions (food, priority course registration, or none). 
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Analysis 
A series of five HLMs were fit to the data. The unconditional model was first fit to estimate the 
variance in PT scores that is between and within schools. 


Level 1: PT; = Boj + Tij 
Level 2: Boj = Yoo + Uoj 


The model shown below treats students (level 1) as nested within institutions (level 2). At the 
student level (level 1), PTij is the PT score of student i at school j, and B,, is the mean PT score 
at school j. At the school level, Bo, is modeled as the grand mean PT score plus the school-level 
residual, Upjs 


Next, a conditional model including only grand-mean centered SAT scores at level 1 was fit to 
see how much of the between-school variance was accounted for by SAT scores. 


Level 1: PT; = Boj + B3; (SAT); = SAT) + rij 
Level 2: Boj = Yoo + Uoj 
By f— 110 


SATIij is the SAT score of student i at school j. Note that grand-mean centering has the effect of 
making Bo, equal to school j’s PT mean adjusted for SAT scores. 


Model 3 added test administration dummy variables at the school level (level 2). 
Level 1: EFF;; = Bo; + By;(SAT;; — SAT) + 14; 
Level 2: 
Boj = Yoo + YoiMANj + Yo2FAC; + Yo3PAY; + YosPER; + YosRAF; + YoopCRE; + Uo; 


Bij = Y10 


Here, adjusted mean PT scores (B,,) are modeled as a function of six dummy variables indicating 
whether testing was mandatory (MAN), faculty were involved in recruitment (FAC), students 
were paid (PAY,), performance-based incentives were employed (PER,), raffle entries were 
offered (RAF,), and course extra credit was offered (CRE) at school j. 


Model 4 is identical to model 3 except that is adds self-reported effort EFF, at the student level. 


Level 1: PT;; = Bo; + Bij(SAT;; — SAT) + B2;(EFF,; — EFF) +74; 

Level 2: 
Boj = Yoo + Yo1MAN;j + Yo2FAG; + Yo3PAY; + YoaPER; + YosRAF; + YogCRE; + uo; 
Bij = Y10 


Ba; = Y20 
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Grand-mean centering on SAT scores and effort has the effect of making ,,;equal the mean PT 
score at school j adjusted for SAT scores and effort. 


As demonstrated in previous research, effort may mediate the relationship between 
administrative procedures and CLA performance (Cole, Bergin, and Whittaker, 2010). When that 
is the case, one should expect coefficients for dummy variables to be significant in Model 3 but 
not when self-reported effort is included (Model 4). Model 5 used self-reported effort as the 
outcome to directly investigate the associations between administrative procedures and self- 
reported effort. 


Level 2: 
Boj = Yoo + YoiMAN;j + Yo2FAC; + Yo3PAY; + YosPER; + YosRAF; + YooCRE; + Uo; 


Bij = Y10 


RESULIS 


Table 1 shows the percentages of schools that employed the administration procedures 
examined in this study. Note that schools employed these procedures more frequently with 
seniors than with freshmen. This finding is consistent with common anecdotal reports from 
schools suggesting that seniors are more challenging to recruit than freshmen. 


Table 1 
Percentages of schools using administration procedures 


Freshmen Seniors 


Mandatory testing (MAN) 18% 33% 
Faculty involvement (FAC) 60% 68% 
Payment (PAY) 50% 70% 
Performance-based incentives (PER) 15% 19% 
Raffle (RAF) 24% 30% 
Course extra credit (CRE) 21% 21% 


Table 2 provides descriptive statistics on the outcome measures. As would be expected due 

to learning in college, the seniors scored higher on the CLA, but some of this difference can be 
accounted for by differences in ability, which were apparent from the mean difference in SAT 
scores. On average, freshmen and seniors reported similar levels of effort, with more than 70% 
of students reporting that they mainly did their best or tried their best. The correlation between 
self-reported effort and CLA performance was .24 (p < .001) in both samples, and this result is 
fairly typical (Steedle, 2014). 


a | 
Test Administration Procedures 6 


Table 2 
Sample demographics and descriptive statistics 


Freshmen Seniors 
Female 60% 60% 
White 72% 74% 
English spoken at home 91% 91% 
Mean SAT (400-1600 scale) 1086 1117 
Mean self-reported effort (1-4 scale) 2.9 3.0 
Mean CLA PT 1080 1194 
Made little effort 4% 4% 
Made some effort 25% 23% 
Mainly did my best 44% 44% 
Tried my best 27% 29% 


Results from the analysis of freshmen are shown in Table 3. Models 1 and 2 indicated that 25% 
of the variance in CLA scores was between schools, but, after controlling for SAT scores, 7% 
was between schools. In Model 3, only the coefficient for payment had a significant positive 
coefficient (p < .05), but the negative coefficient for mandatory testing was nearly significant 

(p = .07). When self-reported effort was added in Model 4, only mandatory testing had a nearly 
significant coefficient (p = .09). Significant coefficients for payment in Models 3 and 5 but not 
Model 4 suggested effort as a mediator between paying students and test performance. Model 
5 also had a significant positive coefficient for performance-based incentives (p < .01). 


In the senior analysis, 20% of the CLA score variance was between schools, and 5% was 
between schools after controlling for SAT scores (Table 4). In Model 3, there was a significant 
negative coefficient for mandatory testing (p < .05) and a significant positive coefficient for 
performance-based incentives (p < .001). Only performance-based incentives had a significant 
coefficient (p < .001) in Model 4. Model 5 had significant coefficients for mandatory testing (p < 
.01), payment (p < .01), and performance-based incentives (p < .05). The comparison of Models 
3,4, and 5 reveals the possible mediating effect of effort between mandatory testing and 
performance. 
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DISCUSSION 


This study investigated associations between test administration procedures, self-reported 
effort, and performance for a low-stakes test of college outcomes. The analysis of senior data 
revealed a significant negative association between requiring students to take the CLA and 
performance after controlling for other variables (nearly significant for freshmen). That is, 
results are consistent with the notion that mandating testing, rather than soliciting volunteers, 
can negatively impact student effort and performance. 


Some have suggested that institutional assessment programs would benefit from faculty 
buy-in and involvement (Steedle, 2010). In this study, faculty involvement in recruitment was 
not significantly associated with effort or performance, after controlling for other variables. 
However, the extent of faculty involvement was unknown. 


Of the incentives, payment for task completion (money, gift certificate, or university cash) 

and performance-based incentives (e.g., the top five scorers receive $100) showed positive 
associations with effort and performance, after controlling for other variables. Raffle entries 
and course extra credit did not. Performance-based incentives for freshmen and payment for 
seniors were correlated with self-reported effort but not test performance. In some cases, 
there was evidence that effort acted as the mediator between administration procedures and 
performance. For example, results are consistent with the notion that mandatory testing of 
senior students reduced effort, thereby reducing performance. Ina similar fashion, higher 
performance among freshmen who were paid can be accounted for by the association between 
payment and effort. 


The practices of paying students to participate and offering performance-based incentives 
appear to support testing effort and performance, thereby strengthening the validity of test- 
score interpretations on low-stakes tests of college outcomes. In contrast, the practice 

of mandating testing (e.g., in randomly selected freshman seminars or senior capstone 
courses) appears to depress testing effort and performance, which could negatively impact 
validity. This conclusion could be subjected to experimentation in future studies. Until then, 
it is recommended that students be solicited to volunteer for testing and offered financial 
incentives, while making every effort to ensure that the tested sample is representative of the 
larger student body. 
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